11 research outputs found

    Adaptive Representations for Tracking Breaking News on Twitter

    Full text link
    Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short, and standard retrieval methods often fail as stories evolve over time. In this paper we examine the effectiveness of adaptive mechanisms for tracking and summarizing breaking news stories. We evaluate the effectiveness of these mechanisms on a number of recent news events for which manually curated timelines are available. Assessments based on ROUGE metrics indicate that an adaptive approaches are best suited for tracking evolving stories on Twitter.Comment: 8 Pag

    Dimensionality Reduction and Visualisation Tools for Voting Records

    Get PDF
    Abstract. Recorded votes in legislative bodies are an important source of data for political scientists. Voting records can be used to describe parliamentary processes, identify ideological divides between members and reveal the strength of party cohesion. We explore the problem of working with vote data using popular dimensionality reduction techniques and cluster validation methods, as an alternative to more traditional scaling techniques. We present results of dimensionality reduction techniques applied to votes from the 6th and 7th European Parliaments, covering activity from 2004 to 2014

    First international workshop on recent trends in news information retrieval (NewsIR’16)

    Get PDF
    The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more and more on social networks and citizen journalism as a frontline to breaking news. In this new era of fast-flowing instant news delivery and consumption, publishers and aggregators have to overcome a great number of challenges. These include the verification or assessment of a source’s reliability; the integration of news with other sources of information; real-time processing of both news content and social streams in multiple languages, in different formats and in high volumes; deduplication; entity detection and disambiguation; automatic summarization; and news recommendation. Although Information Retrieval (IR) applied to news has been a popular research area for decades, fresh approaches are needed due to the changing type and volume of media content available and the way people consume this content. The goal of this workshop is to stimulate discussion around new and powerful uses of IR applied to news sources and the intersection of multiple IR tasks to solve real user problems. To promote research efforts in this area, we released a new dataset consisting of one million news articles to the research community and introduced a data challenge track as part of the workshop

    From Detection to Discourse: Tracking Events and Communities in Breaking News

    No full text
    Online social networks are now an established part of our reality. People no longer rely solely on traditional media outlets to stay informed. Collectively, acts of citizen journalism have transformed news consumers into producers. Keeping up with the overwhelming volume of user-generated content from social media sources is challenging for even well-resourced news organisations. Filtering the most relevant content, however, is not trivial. Significant demand exists for editorial support systems that enable journalists to work more effectively. Social newsgathering introduces many new challenges to the tasks of detecting and tracking breaking news stories. In detection, substantial volumes of data introduce scalability challenges. When tracking developing stories, approaches developed on static collections of documents often fail to capture important changes in the content or structure of data over time. Furthermore, systems tuned on static collections can perform poorly on new, unseen data. To understand significant events, we must also consider the people and organisations who are generating content related to these events. Newsworthy sources are rarely objective and neutral, and in some cases, purposefully created for disinformation, giving rise to the "fake news" phenomenon. An individual's political ideology will inform and influence their choice of language, especially during significant political events such as elections, protests, and other polarising incidents. This thesis presents techniques developed with the intention of supporting journalists who monitor social media for breaking news. Starting with the curation of newsworthy sources, through to implementing an alert system for breaking news events, tracking the evolution of these stories over time, and finally exploring the language used by different communities to gain insights into the discourse around an event. As well as detecting and tracking significant events, it is of interest to identify the differences in language patterns between groups of people around those events. Distributional semantic language models offer a way to quantify certain aspects of discourse, allowing us to track how different communities use language, thereby revealing their stances on key issues

    Detecting Attention Dominating Moments Across Media Types - Tweet Stream

    No full text
    Tweets spanning September 2015. Parallel corpus to Signal Media 1 Million Articles set for NewsIR'16

    Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering

    No full text
    Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.Science Foundation Irelan

    Detecting Attention Dominating Moments Across Media Types

    No full text
    NewsIR’16 Workshop at ECIR, Padua, Italy, 20-March 2016In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is effective for capturing this effect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly different media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.Science Foundation Irelan

    A system for twitter user list curation

    Get PDF
    The ACM Conference on Recommender Systems (RecSys-2012), Dublin, Ireland, 9-13 September 2012With increased adoption of social networking tools, it is becoming more difficult to extract useful information from the mass of data generated daily by users. Curation of content and sources is an important filter in separating the signal from noise. A good set of credible sources often requires painstaking manual curation, which often yields incomplete coverage of a topic. In this demo, we present a recommender system to aid this process, improving the quality and quantity of sources. The system is highly-adaptable to the goals of the curator, enabling some novel uses for curating and monitoring lists of users.Science Foundation Irelan

    Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering

    No full text
    Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.Science Foundation Irelan

    Real time event monitoring with trident

    No full text
    RealStream: Real-World Challenges for Data Stream Mining workshop at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2013), Prague, September 23th to 27th, 2013Building a scalable, fault-tolerant stream mining system that deals with realistic data volumes presents unique challenges. Considerable work is being done to make the development of such systems simpler, creating high level abstractions on top of existing systems. Many of the technical barriers can be eliminated by adopting a state-of-the-art interface, such as the Trident API for Storm. We describe a stream mining tool, based on Trident, for monitoring breaking news events on Twitter, which can be extended quickly and scaled easily.Science Foundation IrelandAuthor has checked copyrightAD 22/01/201
    corecore